Proximity Graphs for Clustering and Manifold Learning
نویسندگان
چکیده
Many machine learning algorithms for clustering or dimensionality reduction take as input a cloud of points in Euclidean space, and construct a graph with the input data points as vertices. This graph is then partitioned (clustering) or used to redefine metric information (dimensionality reduction). There has been much recent work on new methods for graph-based clustering and dimensionality reduction, but not much on constructing the graph itself. Graphs typically used include the fullyconnected graph, a local fixed-grid graph (for image segmentation) or a nearest-neighbor graph. We suggest that the graph should adapt locally to the structure of the data. This can be achieved by a graph ensemble that combines multiple minimum spanning trees, each fit to a perturbed version of the data set. We show that such a graph ensemble usually produces a better representation of the data manifold than standard methods; and that it provides robustness to a subsequent clustering or dimensionality reduction algorithm based on the graph.
منابع مشابه
Schroedinger Eigenmaps with Nondiagonal Potentials for Spatial-Spectral Clustering of Hyperspectral Imagery
Schroedinger Eigenmaps (SE) has recently emerged as a powerful graph-based technique for semi-supervised manifold learning and recovery. By extending the Laplacian of a graph constructed from hyperspectral imagery to incorporate barrier or cluster potentials, SE enables machine learning techniques that employ expert/labeled information provided at a subset of pixels. In this paper, we show how ...
متن کاملGraph clustering using heat kernel embedding and spectral geometry
In this paper we study the manifold embedding of graphs resulting from the Young-Householder decomposition of the heat kernel. We aim to explore how the sectional curvature associated with the embedding can be used as feature for the purposes of gauging the similarity of graphs, and hence clustering them. The curvature is computed from the difference between the geodesic (edge weight) and the E...
متن کاملIntegrating Spatial Proximity with Manifold Learning
Dimension reduction is a useful preprocessing step for many types hyperspectral image analysis, including visualization, regression, clustering and classification. By dimension reduction, high dimensional data are mapped into a lower dimensional space while the important features of the original data are preserved according to a given criterion. Although linear dimension reduction methods such ...
متن کاملA Multi-Objective Approach to Fuzzy Clustering using ITLBO Algorithm
Data clustering is one of the most important areas of research in data mining and knowledge discovery. Recent research in this area has shown that the best clustering results can be achieved using multi-objective methods. In other words, assuming more than one criterion as objective functions for clustering data can measurably increase the quality of clustering. In this study, a model with two ...
متن کاملLabel Propagation for Semi-Supervised Learning in Self-Organizing Maps
Semi-supervised learning aims at discovering spatial structures in high-dimensional input spaces when insufficient background information about clusters is available. A particulary interesting approach is based on propagation of class labels through proximity graphs. The Emergent Self-Organizing Map (ESOM) itself can be seen as such a proximity graph that is suitable for label propagation. It t...
متن کامل